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This listing of claims will replace all prior versions, and listings, of claims in 
the application: 

The Status of the Claims 

1. (Original) A method of reducing memory latency in a software 
application, the method comprising: 

analyzing the software application and determining a first area 
of software instructions that encounters a cache miss; 
generating a helper thread; 

generating a first set of compiler-runtime instructions and 
inserting the first set of compiler-runtime instructions in a main thread; 

generating a second set of compiler-runtime instructions and 
inserting the second set of compiler-runtime instructions in the helper thread; 
and 

inserting a counting mechanism in the main thread and the 
helper thread, the counting mechanism being structured to coordinate relative 
execution points of the main thread and the helper thread. 

2. (Original) A method as defined in claim 1 , further comprising 
analyzing the software application and determining a second area of software 
instructions which encounters a memory load latency. 

3. (Original) A method as defined in claim 2, wherein the first 
area of software instructions is different than the second area of software 
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instructions. 

4. (Original) A method as defined in claim 2, wherein the first 
area of software instructions comprises the second area of software 
instructions. 

5. (Original) A method as defined in claim 2, wherein analyzing 
the software application comprises: 

measuring cache miss rates associated with the software 
application using a performance analysis tool; 

measuring memory load latencies associated with the software 
application using the performance analysis tool; 

reporting the first area of software instructions which 
encounters the cache miss to a compiler; and 

reporting the second area of software instructions which 
encounters the memory load latency. 

6. (Original) A method as defined in claim 1, wherein generating 
a helper thread includes generating a thread graph. 

7. (Original) A method as defined in claim 6, wherein the thread 
graph presents a data structure that represents a relationship between the main 
thread and the helper thread. 
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8. (Original) A method as defined in claim 6, wherein the thread 
graph facilitates code reuse. 

9. (Original) A method as defined in claim 1, wherein at a least 
part of the first set of compiler-runtime instructions comprises at least a part of 
the second set of compiler-runtime instructions. 

10. (Original) A method as defined in claim 1, wherein the first set 
of compiler-runtime instructions inserted in the main thread comprises 
instructions to spawn the helper thread, terminate the helper thread, and 
coordinate execution of the helper thread and the main thread. 

1 1 . (Original) A method as defined in claim 1 , wherein the second 
set of compiler-runtime instructions inserted in the helper thread comprises 
instructions to coordinate execution of the helper thread and the main thread. 

12. (Original) A method as defined in claim 1, wherein the 
counting mechanism comprises a software counter. 

13. (Original) A method as defined in claim 12, wherein at least 
one of the first set of compiler-runtime instructions and the second set of 
compiler-runtime instructions include instructions to control execution rates of 
the helper thread based on a value associated with the software counter. 
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14. (Original) A method as defined in claim 12, wherein at least 
one of the first set of compiler-runtime instructions and the second set of 
compiler-runtime instructions include instructions to control execution rates of 
the main thread based on a value associated with the software counter. 

15. (Original) A method as defined in claim 14, wherein the 
compiler-runtime instructions to control execution rates comprises a delay 
instruction, a catch up instruction and an instruction to force execution. 

16. (Original) A system to reduce memory latency, the system 
comprising: 

a processor; 

a memory operatively coupled to the processor: 

the memory storing a software tool structured to identify a code 
region in an application program that suffers from a data cache miss; 

a compiler operatively coupled to the software tool, the 
compiler being structured to receive information from the software tool and to 
generate a helper thread; 

a set of compiler-runtime instructions to be generated and 
inserted in the application program to manage the helper thread and to manage 
a main thread; and 

a counting mechanism for insertion in the main thread and the 
helper thread to facilitate coordination of execution points associated with the 
helper thread and the main thread. 
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17. (Original) A system as defined in claim 16, wherein the 
software tool comprises a VTune™ Performance Anaylyzer. 

18. (Original) A system as defined in claim 16, wherein the 
information the compiler receives from the software tool comprises data cache 
miss rates associated with the identified code region. 

19. (Original) A system as defined in claim 16, wherein the 
information the compiler receives from the software tool comprises memory 
load latency times associated with the identified code region. 

20. (Original) A system as defined in claim 16, wherein the helper 
thread is structured to prefetch variables contained in the identified code 
region. 

21. (Original) A system as defined in claim 16, wherein the set of 
compiler-runtime instructions comprises instructions to create the helper 
thread, terminate the helper thread, delay execution of the helper thread, and 
activate the helper thread. 

22. (Original) A system as defined in claim 16, wherein the set of 
compiler-runtime instructions comprises instructions to coordinate execution 
of the helper thread and the main thread. 
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23. (Original) A machine readable medium storing instructions to 
cause a machine to: 

analyze a software application including a main thread; 
identify a code region in the software application; 
generate a helper thread; 

generate and insert a first set of compiler-runtime instructions 
in the main thread to manage the helper thread and the main thread; 

generate and insert a second set of compiler-runtime 
instructions in the helper thread to manage the helper thread and the main 
thread; and 

manage execution points of the helper thread and the main 

thread. 

24. (Original) A machine readable medium as defined in claim 22, 
wherein the stored instructions are structured to cause the machine to identify 
the code region based on cache miss rates. 

25. (Original) A machine readable medium as defined in claim 22, 
wherein the stored instructions are structured to cause the machine to identify 
the code region based on memory load latencies. 

26. (Original) A machine readable medium as defined in claim 22, 
wherein the stored instructions are structured to cause the machine to generate 
the helper thread to prefetch instructions within the identified code region. 
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27. (Original) A machine readable medium as defined in claim 22, 
wherein the stored instructions are structured to cause the machine to generate 
compiler-runtime instructions to spawn the helper thread, terminate the helper 
thread, coordinate execution of the helper thread and the main thread. 

28. (Original) A machine readable medium as defined in claim 22, 
wherein the stored instructions are structured to cause the machine to manage 
the execution of the main thread and the helper thread by inserting a first 
portion of a counting mechanism in the main thread and a second portion of a 
counting mechanism in the helper thread. 

29. (Original) An apparatus to reduce memory latency, the 
apparatus comprising: 

a software tool structured to identify a code region in an 
application program that suffers from a data cache miss; 

a compiler operatively coupled to the software tool, the 
compiler being structured to receive information from the software tool and to 
generate a helper thread; 

a set of compiler-runtime instructions to be generated and 
inserted in the application program to manage the helper thread and to manage 
a main thread; and 

a counting mechanism for insertion in the main thread and the 
helper thread to facilitate coordination of execution points associated with the 
helper thread and the main thread. 
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30. (Original) An apparatus as defined in claim 29, wherein the 
information the compiler receives from the software tool comprises data cache 
miss rates associated with the identified code region. 

3 1 . (Original) An apparatus as defined in claim 29, wherein the 
information the compiler receives from the software tool comprises memory 
load latency times associated with the identified code region. 

32. (Original) An apparatus as defined in claim 29, wherein the 
helper thread is structured to prefetch variables contained in the identified 
code region. 

33. (Original) An apparatus as defined in claim 29, wherein the set 
of compiler-runtime instructions comprises instructions to create the helper 
thread, terminate the helper thread, delay execution of the helper thread, and 
activate the helper thread. 

34. (Original) An apparatus as defined in claim 29, wherein the set 
of compiler-runtime instructions comprises instructions to coordinate 
execution of the helper thread and the main thread. 
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